NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Design of Replication Studies

https://doi.org/10.1111/rssa.12688

Hedges, Larry V.; Schauer, Jacob M. (March 2021, Journal of the Royal Statistical Society Series A: Statistics in Society)

Abstract Empirical evaluations of replication have become increasingly common, but there has been no unified approach to doing so. Some evaluations conduct only a single replication study while others run several, usually across multiple laboratories. Designing such programs has largely contended with difficult issues about which experimental components are necessary for a set of studies to be considered replications. However, another important consideration is that replication studies be designed to support sufficiently sensitive analyses. For instance, if hypothesis tests are to be conducted about replication, studies should be designed to ensure these tests are well-powered; if not, it can be difficult to determine conclusively if replication attempts succeeded or failed. This paper describes methods for designing ensembles of replication studies to ensure that they are both adequately sensitive and cost-efficient. It describes two potential analyses of replication studies—hypothesis tests and variance component estimation—and approaches to obtaining optimal designs for them. Using these results, it assesses the statistical power, precision of point estimators and optimality of the design used by the Many Labs Project and finds that while it may have been sufficiently powered to detect some larger differences between studies, other designs would have been less costly and/or produced more precise estimates or higher-powered hypothesis tests.
more » « less
An evaluation of statistical methods for aggregate patterns of replication failure

https://doi.org/10.1214/20-AOAS1387

Schauer, Jacob M.; Fitzgerald, Kaitlyn G.; Peko-Spicer, Sarah; Whalen, Mena C.; Zejnullahi, Rrita; Hedges, Larry V. (March 2021, The Annals of Applied Statistics)

Full Text Available
The Statistics of Replication

https://doi.org/10.1027/1614-2241/a000173

Hedges, Larry V. (October 2019, Methodology)

Abstract. The concept of replication is fundamental to the logic and rhetoric of science, including the argument that science is self-correcting. Yet there is very little literature on the methodology of replication. In this article, I argue that the definition of replication should not require underlying effects to be identical, but should permit some variation in true effects to be allowed. I note that different possible analyses could be used to determine whether studies replicate. Finally, I argue that a single replication study is almost never adequate to determine whether a result replicates. Thus, methodological work on the design of replication studies would be useful.
more » « less
Full Text Available
Consistency of effects is important in replication: Rejoinder to Mathur and VanderWeele (2019).

https://doi.org/10.1037/met0000237

Hedges, Larry V.; Schauer, Jacob M. (October 2019, Psychological Methods)

Full Text Available
More Than One Replication Study Is Needed for Unambiguous Tests of Replication

https://doi.org/10.3102/1076998619852953

Hedges, Larry V.; Schauer, Jacob M. (June 2019, Journal of Educational and Behavioral Statistics)

The problem of assessing whether experimental results can be replicated is becoming increasingly important in many areas of science. It is often assumed that assessing replication is straightforward: All one needs to do is repeat the study and see whether the results of the original and replication studies agree. This article shows that the statistical test for whether two studies obtain the same effect is smaller than the power of either study to detect an effect in the first place. Thus, unless the original study and the replication study have unusually high power (e.g., power of 98%), a single replication study will not have adequate sensitivity to provide an unambiguous evaluation of replication.
more » « less
Full Text Available

Search for: All records